feat(rivetkit): expose low-cardinality metrics routes#5019
Conversation
PR #5019 Review: feat(rivetkit): expose low-cardinality metrics routesOverviewThis PR makes two interconnected changes:
Security Concerns
In // Before: checked RIVETKIT_METRICS_ENABLED + RIVETKIT_METRICS_TOKEN bearer token
// After:
let metrics = crate::metrics_endpoint::render_prometheus_metrics()?;
bytes_response(StatusCode::OK, &metrics.content_type, metrics.body)The Similarly, the new Breaking ChangesMetric renames will break existing dashboards and alerts
Code QualityNo-op methods are dead code, not backwards-compatibility shims Several methods are converted to empty no-ops: pub(crate) fn set_queue_depth(&self, _depth: u32) {}
pub(crate) fn set_active_connections(&self, _count: usize) {}
pub(crate) fn set_lifecycle_inbox_depth(&self, _depth: usize) {}
// etc.Per CLAUDE.md, backwards-compatibility hacks like this should be avoided. If the metric is intentionally dropped, the callers should be updated to remove the call sites too. Keeping empty stubs is misleading — it looks like metrics are tracked when they aren't.
impl Default for ActorMetrics {
fn default() -> Self {
Self::new("") // increments actor_active_count{actor_name=""}
}
}This creates a live Prometheus label entry with
Both call sites pass
The old code iterated all candidates; the new code silently takes only the first. A comment explaining the single-candidate assumption would help. Observability ConcernsRemoved per-actor granularity is intentional but irreversible Replacing instance-level labels with just
The old Serverless no-envoy health is optimistic None => health_response(200, "ok", &version),When serverless mode has no envoy, health returns Test Coverage
Minor
SummaryThe low-cardinality direction is the right call for production metrics systems. The main blockers are the removed auth on the runtime HTTP handler and the missing auth on the new public
|
cb4ffa1 to
e646699
Compare
3a9c1be to
72886b4
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: